{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Downloading Objectron Data\n",
    "This tutorial covers how to download/use the Objectron datasets. \n",
    "There are three ways you can download the objectron data to you disks. \n",
    "- Use `gsutil`\n",
    "- Download via Public HTTP API\n",
    "- Download using Cloud Python client.\n",
    "\n",
    "Keep in mind you can always directly consume the dataset from the Google cloud bucket using our [tf.data.Dataset](https://github.com/google-research-datasets/Objectron/blob/master/notebooks/Hello%20World.ipynb) or [torch_xla.utils.tf_record_reader](https://github.com/google-research-datasets/Objectron/blob/master/notebooks/Objectron_Pytorch_tutorial.ipynb) without copying the data to your local machine. See the tutorial notebooks for more details.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data Locations\n",
    "The data is stored in the `objectron` bucket on Google Cloud storage. and includes the following assets:\n",
    "\n",
    "- The video sequences (located in `gs://objectron/videos/class/batch-i/j/video.MOV` files)\n",
    "- The annotation labels containing the 3D bounding boxes for objects stored in `gs://objectron/annotations`. The annotation protobufs are located in `/videos/class/batch-i/j/geometry.pbdata` files. They are formatted using the object.proto.\n",
    "- AR metadata (such as camera poses, point clouds, and planar surfaces).\n",
    "- Processed dataset: sharded and shuffled `tf.records` of the annotated frames, in tf.example format. These are used for creating the input data pipeline to your models. These files are located in `gs://objectron/v1/records_sharded/class/`\n",
    "- The index of all available samples, as well as train/test splits for easy access and download. For each category, first you need to get the index for the available files. Copies of the indices are stored in the objectron bucket (`objectron/v1/index`) as well as the [github repo](https://github.com/google-research-datasets/Objectron/tree/master/index) under `index` folder. There are three files: class_annotations, and the 20/80 test/train split: class_annotations_test and class_annotations_train. Each file contains multiple lines with the format `class/batch-i/j`. Combine this with the root directory of the Objectron to get the key for videos and annotations.\n",
    "\n",
    "For example, for public URLs:\n",
    "* annotation file: https://storage.googleapis.com/objectron/annotations/class/batch-i/j.pbdata\n",
    "* video file: https://storage.googleapis.com/objectron/videos/class/batch-i/j/video.MOV\n",
    "* AR metadata: https://storage.googleapis.com/objectron/videos/class/batch-i/j/geometry.pbdata"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Downloading Data using gsutil\n",
    "\n",
    "`gsutil` is the small utility to execute shell commands like ls and cp on the google storage bucket. \n",
    "For example you can use\n",
    "\n",
    "```\n",
    "gsutil ls gs://objectron/v1/records_shuffled\n",
    "```\n",
    "to see the available classes in the dataset. Similarly, the easiest way to copy files to the local machine would be\n",
    "```\n",
    "gsutil cp -r gs://objectron/v1/records_shuffled local_dataset_dir\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Downloading Data using Public HTTP API\n",
    "The users can download data without authentication directly using HTTP address. The dataset's public URL is `https://storage.googleapis.com/objectron`. You can use curl, request or any other downloader for this purpose.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "import requests\n",
    "public_url = \"https://storage.googleapis.com/objectron\"\n",
    "blob_path = public_url + \"/v1/index/cup_annotations_test\"\n",
    "video_ids = requests.get(blob_path).text\n",
    "video_ids = video_ids.split('\\n')\n",
    "# Download the first ten videos in cup test dataset\n",
    "for i in range(1):\n",
    "    video_filename = public_url + \"/videos/\" + video_ids[i] + \"/video.MOV\"\n",
    "    metadata_filename = public_url + \"/videos/\" + video_ids[i] + \"/geometry.pbdata\"\n",
    "    annotation_filename = public_url + \"/annotations/\" + video_ids[i] + \".pbdata\"\n",
    "    # video.content contains the video file.\n",
    "    video = requests.get(video_filename)\n",
    "    metadata = requests.get(metadata_filename)\n",
    "    annotation = requests.get(annotation_filename)\n",
    "    file = open(\"example.MOV\", \"wb\")\n",
    "    file.write(video.content)\n",
    "    file.close()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Downloading data using cloud storage API\n",
    "The third option is to download the data using Google cloud storage Python API. Alternatively you can use the google.cloud [Python API](https://cloud.google.com/storage/docs/downloading-objects#storage-download-object-python) to download the files using Python. For more information, see the [Cloud Storage Python API reference documentation](https://cloud.google.com/storage/docs/reference/libraries).\n",
    "\n",
    "Note the cloud storage API requires you to authenticate before downloading the dataset. Refer to [authentication](https://googleapis.dev/python/google-api-core/latest/auth.html) on how to set your credentials."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install google-cloud-storage"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from google.cloud import storage\n",
    "\n",
    "def download_blob(bucket_name, \n",
    "                  source_blob_name, \n",
    "                  destination_file_name):\n",
    "    \"\"\"Downloads a blob from the bucket.\"\"\"\n",
    "    storage_client = storage.Client()\n",
    "    bucket = storage_client.bucket(bucket_name)\n",
    "    blob = bucket.blob(source_blob_name)\n",
    "    blob.download_to_filename(destination_file_name)\n",
    "    print(\n",
    "        \"Blob {} downloaded to {}.\".format(\n",
    "            source_blob_name, destination_file_name\n",
    "        )\n",
    "    )\n",
    "download_blob('objectron', 'v1/index/cup_annotation_train',  './cup_annotation_train')"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
